Classification - HELOC Credit Risk
Predicting credit risk for Home Equity Line of Credit applications using the FICO HELOC dataset.
Dataset Source:
FICO HELOC Dataset
Problem Type: Classification
Target Variable: RiskPerformance - Whether applicant will pay as negotiated
(Good/Bad) Use Case: Credit risk assessment for financial institutions to identify
borrowers at risk of defaulting
Package Imports
Install and import relevant packages
!pip install xplainable
!pip install xplainable-client
import pandas as pd
from sklearn.model_selection import train_test_split
import requests
import json
import xplainable as xp
from xplainable.core.models import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser
from xplainable.preprocessing.pipeline import XPipeline
from xplainable.preprocessing import transformers as xtf
import xplainable_client
Data Loading and Exploration
Load the HELOC dataset and explore its structure
# Load dataset
data = pd.read_csv('https://xplainable-public-storage.syd1.digitaloceanspaces.com/example_data/heloc_dataset.csv')
# Display basic information
print(f"Dataset shape: {data.shape}")
print(f"Target distribution:\n{data['RiskPerformance'].value_counts()}")
data.head()
Where the defition of each of the fields are below:
Variable Names | Description |
---|---|
RiskPerformance | Paid as negotiated flag (12-36 Months). String of Good and Bad |
ExternalRiskEstimate | Consolidated version of risk markers |
MSinceOldestTradeOpen | Months Since Oldest Trade Open |
MSinceMostRecentTradeOpen | Months Since Most Recent Trade Open |
AverageMInFile | Average Months in File |
NumSatisfactoryTrades | Number of Satisfactory Trades |
NumTrades60Ever2DerogPubRec | Number of Trades 60+ Ever |
NumTrades90Ever2DerogPubRec | Number of Trades 90+ Ever |
PercentTradesNeverDelq | Percent of Trades Never Delinquent |
MSinceMostRecentDelq | Months Since Most Recent Delinquency |
MaxDelq2PublicRecLast12M | Max Delinquency/Public Records in the Last 12 Months. See tab 'MaxDelq' for each category |
MaxDelqEver | Max Delinquency Ever. See tab 'MaxDelq' for each category |
NumTotalTrades | Number of Total Trades (total number of credit accounts) |
NumTradesOpeninLast12M | Number of Trades Open in Last 12 Months |
PercentInstallTrades | Percent of Installment Trades |
MSinceMostRecentInqexcl7days | Months Since Most Recent Inquiry excluding the last 7 days |
NumInqLast6M | Number of Inquiries in the Last 6 Months |
NumInqLast6Mexcl7days | Number of Inquiries in the Last 6 Months excluding the last 7 days. Excluding the last 7 days removes inquiries that are likely due to price comparison shopping. |
NetFractionRevolvingBurden | This is the revolving balance divided by the credit limit |
NetFractionInstallBurden | This is the installment balance divided by the original loan amount |
NumRevolvingTradesWBalance | Number of Revolving Trades with Balance |
NumInstallTradesWBalance | Number of Installment Trades with Balance |
NumBank2NatlTradesWHighUtilization | Number of Bank/National Trades with high utilization ratio |
PercentTradesWBalance | Percent of Trades with Balance |
1. Data Preprocessing
Prepare features and target variable
y = data['RiskPerformance']
x = data.drop('RiskPerformance',axis=1)
Create Train/Test Split
X, y = data.drop(columns=['RiskPerformance']), data['RiskPerformance']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
2. Model Optimization
The XParamOptimiser fine-tunes the hyperparameters of our model to achieve optimal performance.
opt = XParamOptimiser(metric='f1-score', n_trials=300, n_folds=2, early_stopping=150)
params = opt.optimise(X_train, y_train)
3. Model Training
Train the XClassifier with optimized parameters.
model = XClassifier(**params)
model.fit(X_train, y_train)
4. Model Interpretability and Explainability
Generate insights into the model's decision-making process and understand feature importance.
model.explain()
Analysing Feature Importances and Contributions
Click on the bars to see the importances and contributions of each variable.
Feature Importances
The relative significance of each feature (or input variable) in making predictions. It indicates how much each feature contributes to the model’s predictions, with higher values implying greater influence.
Feature Significance
The effect of each feature on individual predictions. For instance, in this model, feature contributions would show how each feature (like the net fraction of trades revolving burden) affects the predicted risk estimate for a particular applicant.
5. Model Persistence
Save the model to Xplainable Cloud for collaboration and deployment.
In this step, we first create a unique identifier for our HELOC risk prediction model using client.create_model_id. This identifier, referred to as model_id, represents the newly created model that predicts the likelihood of applicants defaulting on their line of credit. After creating this model identifier, we generate a specific version of the model using client.create_model_version, passing in our training data. The resulting version_id represents this particular iteration of our model, allowing us to track and manage different versions systematically.
Xplainable Cloud Setup
# Initialize Xplainable Cloud client
client = xplainable_client.Client(
api_key="", # Add your API key from https://platform.xplainable.io/
)
# Create a model
model_id = client.create_model(
model=model,
model_name="HELOC Credit Risk Model",
model_description="Predicting applicant credit risk for HELOC applications",
x=X_train,
y=y_train
)
Initialize Xplainable Cloud client
client = xplainable_client.Client( api_key="83b8d99c-ca2c-4132-b1e9-ed86db83f306", hostname="https://xplainable-api-uat-itdcj.ondigitalocean.app/" )
6. Model Deployment
Deploy the model for real-time predictions.
The code block illustrates the deployment of our churn prediction model using the xp.client.deploy function. The deployment process involves specifying the hostname of the server where the model will be hosted, as well as the unique model_id and version_id that we obtained in the previous steps. This step effectively activates the model's endpoint, allowing it to receive and process prediction requests. The output confirms the deployment with a deployment_id, indicating the model's current status as 'inactive', its location, and the endpoint URL where it can be accessed for xplainable deployments.
model_id
deployment = client.deploy(
model_version_id=model_id["version_id"]
)
deployment
7. Model Testing
Test the deployed model with sample predictions.
- Activating the Deployment: The model deployment is activated using
client.activate_deployment
, which changes the deployment status to active, allowing it to accept prediction requests.
client.activate_deployment(deployment['deployment_id'])
- Creating a Deployment Key: A deployment key is generated with
client.generate_deploy_key
. This key is required to authenticate and make secure requests to the deployed model.
deploy_key = client.generate_deploy_key(deployment['deployment_id'],'HELOC Deploy Key', 7)
- Generating Example Payload: An example payload for a deployment request is
generated by
client.generate_example_deployment_payload
. This payload mimics the input data structure the model expects when making predictions.
#Set the option to highlight multiple ways of creating data
option = 2
if option == 1:
body = client.generate_example_deployment_payload(deployment['deployment_id'])
else:
body = json.loads(data.drop(columns=["RiskPerformance"]).sample(1).to_json(orient="records"))
body
- Making a Prediction Request: A POST request is made to the model's prediction endpoint with the example payload. The model processes the input data and returns a prediction response, which includes the predicted class (e.g., 'No' for no churn) and the prediction probabilities for each class.
response = requests.post(
url="https://inference.xplainable.io/v1/predict",
headers={'api_key': deploy_key['deploy_key']},
json=body
)
value = response.json()
value
SaaS Deployment Info
The SaaS application interface displayed above mirrors the operations performed programmatically in the earlier steps. It displays a dashboard for managing the 'Telco Customer Churn' model, facilitating a range of actions from deployment to testing, all within a user-friendly web interface. This makes it accessible even to non-technical users who prefer to manage model deployments and monitor performance through a graphical interface rather than code. Features like the deployment checklist, example payload, and prediction response are all integrated into the application, ensuring that users have full control and visibility over the deployment lifecycle and model interactions.